cancer-markers
Materials and methods
Metagenomic data
OGU table contruction
All 627 stool metagenomic samples were download from NCBI/EBI databases using kingfisher v0.4.1 (https://github.com/wwood/kingfisher-download). Quality assessment of data was performed using FastQC v0.12.1 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc). Raw reads were processed using fastp v0.23.4 to remove low-quality sequences [Chen et al., 2018]. Human sequences present in the metagenomic samples were eliminated using hisat2 [Kim et al., 2019] and the human genome GRCh37 (Release 47) (https://www.gencodegenes.org/human). Metagenomic assembly was performed using MEGAHIT v1.2.9 [Li et al., 2015] with further filtering by length > 1000 bp. Hisat2 v2.2 was used for alignment to metagenomic reads to assembly filtered contigs [Kim et al. 2019]. Binning process were implemented as two stages 1) metagenomic binning using MetaBat2 v2.12.1 [Kang et al., 2019], MaxBin2 v2.2.7 [Wu et al., 2016], and Semibin2 v2.1.0 [Pan et al., 2023] tools; 2) DAS Tool v1.1.7 to enhance binning quality [Sieber et al, 2018]. Quality control of binning was performed using CheckM v1.2.3 [Parks et al., 2015]. Deprecation at 98% nucleotide identity of bins was performed using dRep v3.4.5 [Olm et al., 2017]. Taxonomic annotation of bins was performed using GTDBTk v2.1.1 [Chaumeil et al., 2022] with GTDB r207 [Parks et al, 2022]. Samtools v1.17 [Li et al., 2009], bedtools v2.31.0 [Quinlan et al., 2010] and bbmap v39.06 [Bushnell et al., 2014] were used for additional manipulation. InStrain v1.9.0 was used to obtain OGU abundance profiles [Olm et al, 2021].
Marker OGU discovery
The identification of OGUs associated with immunotherapy outcomes followed an established analytical framework from our previous work [Olekhnovich et al., 2023; Zakharevich et al., 2024]. Differential rankings analysis was first performed using Songbird [Morton et al., 2019] to identify MAGs showing significant abundance variations between response groups, applying a conservative absolute differential value threshold of > 0.3. For candidate MAGs meeting this criterion, we subsequently calculated log-ratio abundances using Qurro [Fedarko et al., 2020] and determined statistical significance through Wilcoxon rank-sum tests implemented in the R statistical environment. The biomarker selection process incorporated stringent cross-validation criteria to ensure robust identification of clinically relevant microbial signatures. MAGs demonstrating consistent positive associations with therapeutic response across multiple datasets were retained as potential beneficial biomarkers, while any evidence of negative association with treatment outcome in any dataset resulted in automatic exclusion regardless of other positive associations. This approach enabled simultaneous identification of two clinically meaningful biomarker categories: microbial taxa positively correlated with successful immunotherapy outcomes and those associated with adverse therapeutic responses. The methodology emphasizes reproducibility through multi-dataset validation and maintains rigorous standards for biomarker qualification by requiring consistent directional effects across independent cohorts.
lmer multicomp
logistic regression
clusterProfilers
ggseavis
Functional profiling of marker OGU
MetaCerberus
logistic resgression
clusterProfilers
References
- Chen, Shifu, et al. “fastp: an ultra-fast all-in-one FASTQ preprocessor.” Bioinformatics 34.17 (2018): i884-i890.
- Li, Dinghua, et al. “MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.” Bioinformatics 31.10 (2015): 1674-1676.
- Kim, Daehwan, et al. “Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype.” Nature biotechnology 37.8 (2019): 907-915.
- Kang, Dongwan D., et al. “MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.” PeerJ 7 (2019): e7359.
- Wu, Yu-Wei, Blake A. Simmons, and Steven W. Singer. “MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets.” Bioinformatics 32.4 (2016): 605-607.
- Pan, Shaojun, Xing-Ming Zhao, and Luis Pedro Coelho. “SemiBin2: self-supervised contrastive learning leads to better MAGs for short-and long-read sequencing.” Bioinformatics 39.Supplement_1 (2023): i21-i29.
- Sieber, Christian MK, et al. “Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy.” Nature microbiology 3.7 (2018): 836-843.
- Parks, Donovan H., et al. “CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.” Genome research 25.7 (2015): 1043-1055.
- Olm, Matthew R., et al. “dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication.” The ISME journal 11.12 (2017): 2864-2868.
- Chaumeil, Pierre-Alain, et al. “GTDB-Tk v2: memory friendly classification with the genome taxonomy database.” Bioinformatics 38.23 (2022): 5315-5316.
- Parks, Donovan H., et al. “GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy.” Nucleic acids research 50.D1 (2022): D785-D794.
- Li, Heng, et al. “The sequence alignment/map format and SAMtools.” bioinformatics 25.16 (2009): 2078-2079.
- Quinlan, Aaron R., and Ira M. Hall. “BEDTools: a flexible suite of utilities for comparing genomic features.” Bioinformatics 26.6 (2010): 841-842.
- Bushnell, Brian. “BBMap: a fast, accurate, splice-aware aligner.” (2014).
- Olm, Matthew R., et al. “inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains.” Nature Biotechnology 39.6 (2021): 727-736.
- Olekhnovich, Evgenii I., et al. “Consistent stool metagenomic biomarkers associated with the response to melanoma immunotherapy.” Msystems 8.2 (2023): e01023-22.
- Zakharevich, Natalia V., et al. “Systemic metabolic depletion of gut microbiome undermines responsiveness to melanoma immunotherapy.” Life Science Alliance 7.5 (2024).
- Morton, James T., et al. “Establishing microbial composition measurement standards with reference frames.” Nature communications 10.1 (2019): 2719.
- Fedarko, Marcus W., et al. “Visualizing’omic feature rankings and log-ratios using Qurro.” NAR genomics and bioinformatics 2.2 (2020): lqaa023.
- Bolyen, Evan, et al. “Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.” Nature biotechnology 37.8 (2019): 852-857.
Results
Data overview
The analysis incorporated 627 gut metagenomic profiling samples obtained from 11 independent external datasets. Patients were stratified by immunotherapy response into two groups: responders (R group, n = 365; 58.2%) and non-responders (NR group, n = 262; 41.8%). Response assessment followed RECIST 1.1 criteria, with the R group including patients showing complete response (CR), partial response (PR), or stable disease (SD) at 6-month follow-up, while the NR group comprised exclusively progressive disease (PD) cases. The study cohort received various immunotherapy regimens, including anti-PD1, anti-CTLA4, or combination therapies. Cancer type distribution revealed melanoma predominance (n = 456; 72.7%), followed by gastrointestinal cancers (n = 82; 13.1%), non-small cell lung cancer (n = 15; 2.4%), breast cancer (n = 4; 0.6%), ovarian cancer (n = 2; 0.3%), and other malignancies (n = 68; 10.8%). All samples were collected prior to treatment initiation to evaluate baseline microbiota status.